- 分組名單
- 期末報告題目
- 預計分析資料
- 資料來源
- 資料格式
- 預計分析議題
- 假設
- 預計得到的結果
- 分析結果可以解決什麼問題
- 5/2 (一) 11:59pm
- http://goo.gl/forms/8UgvNQlHVp
May 2, 2016
按Raw,右鍵另存新檔
投影片下載:
按右鍵,另存新檔
#讀入SportsAnalytics package
if (!require('SportsAnalytics')){
install.packages("SportsAnalytics")
library(SportsAnalytics)
}
#擷取2014-2015年球季球員資料
NBA1415<-fetch_NBAPlayerStatistics("14-15")
One dimension
summary(NBA1415$TotalPoints)boxplot(NBA1415$TotalPoints)hist(NBA1415$TotalPoints)density(NBA1415$TotalPoints)barplot(table(NBA1415$Team))Two dimensions
plot(x,y)\(> 2\) dimensions
類3D圖#預計要做圖的'值'(TotalPoints) ~ 分組依據(Team) boxplot(TotalPoints ~ Team, data = NBA1415, col = "red")
#mfrow設定一張圖裡有多少子圖,mar設定邊界大小 par(mfrow = c(2, 1), mar = c(4, 4, 2, 1)) #一張圖裡面有2x1個子圖 hist(subset(NBA1415, Team == "SAN")$TotalPoints, col = "green") hist(subset(NBA1415, Team == "GSW")$TotalPoints, col = "green")
par(mfrow = c(1, 1)) #一張圖裡面只有一個子圖 #畫x為TotalMinutesPlayed, y為TotalPoints的散佈圖 plot(NBA1415$TotalMinutesPlayed, NBA1415$TotalPoints) #畫一條橫線h = 500,寬度lwd = 2,樣式lty = 2(虛線) abline(h = 500, lwd = 2, lty = 2)
用顏色在二維散佈圖中加上第三維的資訊
#col=NBA1415$Team 用隊伍名稱著色,意指不同隊伍的球員不同色 plot(NBA1415$TotalMinutesPlayed, NBA1415$TotalPoints,col=NBA1415$Team) abline(h = 500, lwd = 2, lty = 2)
#mfrow設定一張圖裡有多少子圖,mar設定邊界大小
par(mfrow = c(1, 2), mar = c(5, 4, 2, 1)) #一張圖裡面有1x2個子圖
with(subset(NBA1415, Team == "SAN"), #取得NBA1415中,隊伍是SAN的Row
plot(TotalMinutesPlayed, TotalPoints, main = "SAN"))#main=標題
with(subset(NBA1415, Team == "GSW"),
plot(TotalMinutesPlayed, TotalPoints, main = "GSW"))
Exploratory plots are "quick and dirty"
Let you summarize the data (usually graphically) and highlight any broad features
Explore basic questions and hypotheses (and perhaps rule them out)
Suggest modeling strategies for the "next step"
包括以下Packaes:
lattice: 包括畫圖相關的函數functions: xyplot, bwplot, levelplot
grid: lattice package 的基礎
一個函數畫完圖,不能再加標記和文字等資料(和base畫圖法不同)
xyplot: 畫散佈圖 scatterplotsbwplot: 畫盒鬚圖box-and-whiskers plots (“boxplots”)histogram: 直方圖 histogramsstripplot: 盒鬚圖+點dotplot: dots on "violin strings"splom: 散佈圖的矩陣levelplot, contourplot: for plotting "image" dataxyplot(y ~ x | f ` g, data)
y~x: y-axis~x-axis formula notationf,g are conditioning variables — optional
datalibrary(lattice) library(datasets) ## Simple scatterplot xyplot(Ozone ~ Wind, data = airquality) # y軸~x軸
library(datasets)
library(lattice)
## Convert 'Month' to a factor variable
airquality <- transform(airquality, Month = factor(Month))
xyplot(Ozone ~ Wind | Month, #y軸~x軸 | 分組依據
data = airquality, layout = c(5, 1)) # 5 rows, 1 column
p <- xyplot(Ozone ~ Wind, data = airquality) ## Nothing happens! print(p) ## Plot appears
xyplot(Ozone ~ Wind, data = airquality) ## Auto-printing
xyplot)plot function (or similar)text, lines, points, axis)xyplot, bwplot, etc.)base and lattice+The Grammar of Graphics by Leland WilkinsonHadley Wickham (while he was a graduate student at Iowa State)base and lattice)Grammar of graphics represents an abstraction of graphics ideas/objectsmapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars). The plot may also contain statistical transformations of the data and is drawn on a specific coordinate system”ggplot2 book基本元素:
其他元素包括:
qplot()plot function in base graphics systemaesthetics (size, shape, color) and geoms (points, lines)qplot()labeledqplot() hides what goes on underneath, which is okay for most operationsggplot() is the core function and very flexible for doing things qplot() cannot dolibrary(ggplot2) #記得將ggplot2 package讀入,如果沒安奘記得先安裝 #qplot(x軸,y軸,data=使用資料)--->畫散佈圖 qplot(FieldGoalsAttempted, TotalPoints, data = NBA1415)
#color=Position, 用守備位置Position著色 qplot(FieldGoalsAttempted, TotalPoints, data = NBA1415,color=Position)
#geom = c("point", "smooth") 在圖上加點與漸進線
qplot(FieldGoalsAttempted, TotalPoints, data = NBA1415,
geom = c("point", "smooth"))
#畫TotalPoints的直方圖/ fill = Position 並用守備位置Position著色 qplot(TotalPoints, data = NBA1415, fill = Position)
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#qplot(x軸,y軸,data=使用資料)--->畫散佈圖
#facets = . ~ Position 用守備位置Position分群畫圖(橫向)
qplot(FieldGoalsAttempted, TotalPoints, data = NBA1415,
facets = . ~ Position)
#facets = . ~ Position 用守備位置Position分群畫圖(直向)
qplot(FieldGoalsAttempted, TotalPoints, data = NBA1415,
facets = Position ~ .)
qplot(hwy, data = mpg, facets = drv ~ ., binwidth = 2)
#facets = . ~ Position 用守備位置Position分群畫圖(直向)
#binwidth = 100 每100分一組
qplot(TotalPoints, data = NBA1415,
facets = Position ~ ., binwidth = 100)
qplot() function is the analog to plot() but with many built-in featuresbase/latticeggplot2 book by Hadley WickhamR Graphics Cookbook by Winston Chang (examples in base plots and in ggplot2)Grammar of Graphics by Leland Wilkinsondata frameaesthetic mappings: how data are mapped to color, sizegeoms: geometric objects like points, lines, shapes.facets: for conditional plots.stats: statistical transformations like binning, quantiles, smoothing.scales: what scale an aesthetic map uses (example: male = red, female = blue).coordinate systemxlab(), ylab(), labs(), ggtitle()theme()theme(legend.position = "none")theme_gray(): The default theme (gray background)theme_bw(): More stark/plain記得讀入ggplot2 packages
#aes: Aesthetic attributes, 顏色、形狀、點的大小與線的粗細 #geom_*: Geometric objects, 點、線、盒狀圖、直條圖 ggplot(NBA1415, aes(x = Position, y = TotalPoints)) +geom_point()
記得讀入ggplot2 packages
#aes: Aesthetic attributes, 顏色、形狀、點的大小與線的粗細 #geom_*: Geometric objects, 點、線、盒狀圖、直條圖 ggplot(NBA1415, aes(x = Position, y = TotalPoints)) +geom_boxplot()
#facet_grid: 加入子圖,Position~.:直向加入,.~Position:橫向加入
ggplot(NBA1415, aes(x = FieldGoalsAttempted, y = TotalPoints)) +
geom_point()+facet_grid(Position~.)
#geom_smooth: 加入趨勢線,method='lm':linear regression
ggplot(NBA1415, aes(x = FieldGoalsAttempted, y = TotalPoints)) +
geom_point()+facet_grid(Position~.)+geom_smooth(method='lm')
#color=Position:用Position當作著色依據
ggplot(NBA1415, aes(x = FieldGoalsAttempted, y = TotalPoints,color=Position)) +
geom_point()+geom_smooth(method='lm')
testdat <- data.frame(x = 1:100, y = rnorm(100)) testdat[50,2] <- 100 ## Outlier! plot(testdat$x, testdat$y, type = "l", ylim = c(-3,3))
g <- ggplot(testdat, aes(x = x, y = y)) g + geom_line()
g + geom_line() + ylim(-3, 3)
g + geom_line() + coord_cartesian(ylim = c(-3, 3))
choroplethr packageggplot2 package所做的專門畫面量圖的工具if (!require('choroplethr')){
install.packages("choroplethr")
library(choroplethr)
}
用到choroplethr package, 記得先讀入
data(df_pop_state) #記載各州人口數的資料 state_choropleth(df_pop_state) #把各州人口畫在地圖上
用到choroplethr package, 記得先讀入
data(continental_us_states)
state_choropleth(df_pop_state,reference_map = TRUE,
zoom= continental_us_states) #把各州人口畫在地圖上
用choroplethr package畫地圖,資料來自WDI package
if (!require('WDI')){
install.packages("WDI")
library(WDI)
}
choroplethr_wdi(code="SP.POP.TOTL", year=2014,
title="2016 Population", num_colors=1)
choroplethr package畫地圖,資料來自WDI packageWDI: World Development Indicators有許多開放資料可參考choroplethr_wdi(code="SP.DYN.LE00.IN", year=2014,
title="2014 Life Expectancy")
choroplethr package畫地圖,資料來自WDI packageWDI: World Development Indicatorschoroplethr_wdi(code="SP.POP.TOTL", year=2014,
title="2014 Life Expectancy",
zoom=c('taiwan','japan','south korea','philippines'))
使用maptools package 的readShapeSpatial function
if (!require('rgdal')){
install.packages("rgdal");library(rgdal)
}
if (!require('gpclib')){
install.packages("gpclib");library(gpclib)
}
if (!require('rgeos')){
install.packages("rgeos");library(rgeos)
}
if (!require('maptools')){
install.packages("maptools");library(maptools)
}
tw_shp <- readShapeSpatial("TWN_adm/TWN_adm2.shp")
names(tw_shp) #看tw_shp中各個資料的名字
## [1] "ID_0" "ISO" "NAME_0" "ID_1" "NAME_1" ## [6] "ID_2" "NAME_2" "VARNAME_2" "NL_NAME_2" "HASC_2" ## [11] "CC_2" "TYPE_2" "ENGTYPE_2" "VALIDFR_2" "VALIDTO_2" ## [16] "REMARKS_2" "Shape_Leng" "Shape_Area"
rgdal, rgeos,gpclibprint(tw_shp$NAME_2)
## [1] Kaohsiung City Taipei City Changhwa Chiayi ## [5] Hsinchu Hualien Ilan Kaohsiung ## [9] Keelung City Miaoli Nantou Penghu ## [13] Pingtung Taichung Taichung City Tainan ## [17] Tainan City Taipei Taitung Taoyuan ## [21] Yunlin ## 21 Levels: Changhwa Chiayi Hsinchu Hualien Ilan ... Yunlin
tw_shp.df <- fortify(tw_shp, region = "ID_2")
head(tw_shp.df)
## long lat order hole piece id group ## 1 120.2390 22.75155 1 FALSE 1 33637 33637.1 ## 2 120.2701 22.74135 2 FALSE 1 33637 33637.1 ## 3 120.2996 22.70920 3 FALSE 1 33637 33637.1 ## 4 120.3148 22.64980 4 FALSE 1 33637 33637.1 ## 5 120.3168 22.61033 5 FALSE 1 33637 33637.1 ## 6 120.3009 22.60195 6 FALSE 1 33637 33637.1
#做一個假資料來畫
mydata<-data.frame(NAME_2=tw_shp$NAME_2, id=tw_shp$ID_2,
prevalence=1:length(tw_shp$NAME_2))
head(mydata)
## NAME_2 id prevalence ## 1 Kaohsiung City 33637 1 ## 2 Taipei City 33638 2 ## 3 Changhwa 33639 3 ## 4 Chiayi 33640 4 ## 5 Hsinchu 33641 5 ## 6 Hualien 33642 6
final.plot<-merge(tw_shp.df,mydata,by="id",all.x=T) head(final.plot)
## id long lat order hole piece group NAME_2 ## 1 33637 120.2390 22.75155 1 FALSE 1 33637.1 Kaohsiung City ## 2 33637 120.2701 22.74135 2 FALSE 1 33637.1 Kaohsiung City ## 3 33637 120.2996 22.70920 3 FALSE 1 33637.1 Kaohsiung City ## 4 33637 120.3148 22.64980 4 FALSE 1 33637.1 Kaohsiung City ## 5 33637 120.3168 22.61033 5 FALSE 1 33637.1 Kaohsiung City ## 6 33637 120.3009 22.60195 6 FALSE 1 33637.1 Kaohsiung City ## prevalence ## 1 1 ## 2 1 ## 3 1 ## 4 1 ## 5 1 ## 6 1
library(RColorBrewer) #配色用brewer.pal( 9 , "Reds" )
twmap<-ggplot() +
geom_polygon(data = final.plot,
aes(x = long, y = lat, group = group,
fill = prevalence),
color = "black", size = 0.25) +
coord_map()+
scale_fill_gradientn( colours = brewer.pal(9,"Reds"))+
theme_void()+
labs(title="Prevalence of X in Taiwan")
twmap
library(ggmap)
if (!require('ggmap')){
install.packages("ggmap")
library(ggmap)
}
twmap <- get_map(location = 'Taiwan', zoom = 7,language = "zh-TW")
#location:可以是地名,也可以是經緯度座標
#zoom:放大比例
#language:地圖語言
ggmap(twmap) #基於ggplot2物件,可用相同方式處理
台北市水質地圖,資料處理部分
library(jsonlite)
WaterData<-fromJSON("http://data.taipei/opendata/datalist/apiAccess?scope=resourceAquire&rid=190796c8-7c56-42e0-8068-39242b8ec927")
WaterDataFrame<-WaterData$result$results
WaterDataFrame$longitude<-as.numeric(WaterDataFrame$longitude)
WaterDataFrame$latitude<-as.numeric(WaterDataFrame$latitude)
WaterDataFrame$qua_cntu<-as.numeric(WaterDataFrame$qua_cntu)
台北市水質地圖,畫圖部分
library(ggmap)
TaipeiMap = get_map(location = c(121.43,24.93,121.62,25.19),
zoom = 11, maptype = 'roadmap')
TaipeiMapO = ggmap(TaipeiMap)+
geom_point(data=subset(WaterDataFrame,qua_cntu>=0),
aes(x=longitude, y=latitude,color=qua_cntu,size=3.5))+
scale_color_continuous(low = "yellow",high = "red")+
guides(size=FALSE)
台北市水質地圖,畫圖部分
TaipeiMapO